Skip to content

feat: python based catalog and schema provider #1156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jul 2, 2025

Conversation

timsaucer
Copy link
Contributor

@timsaucer timsaucer commented Jun 18, 2025

Which issue does this PR close?

Closes #1091

Rationale for this change

This PR builds on top of #1137 and adds python based schema and catalog providers.

What changes are included in this PR?

Creates wrapper types around python based schema and catalog providers.
Adds checks for when we are going python->rust->python and short circuit to return the python original object rather than wrapper.
Adds unit test

Are there any user-facing changes?

No

Status

Leaving as draft until the following are complete:

  • We need abstract base classes for the catalog and schema providers for users to implement
  • Verify all combinations of default vs user provided python catalog, schema, table providers work in unit tests - this is mostly done
  • Add full end-to-end test of registering custom catalog, schema, table and then do a SQl query against it
  • Update online documentation

@timsaucer
Copy link
Contributor Author

FYI @renato2099 getting the python based providers ended up being a blocking issue for some of my work so I took a stab at implementing it. Please tell me what you think if you have some time.

@timsaucer timsaucer force-pushed the feat/python-catalog-provider branch from dc715fe to 2584075 Compare June 18, 2025 13:35
@timsaucer timsaucer self-assigned this Jun 19, 2025
@timsaucer timsaucer marked this pull request as ready for review June 19, 2025 20:45
@renato2099
Copy link
Contributor

Hi @timsaucer ,

I am sorry I wasn't able to complete this in time, but I had it still on my radar. I pushed my version yesterday after the holidays :) #1137 and after a quick look at your PR, it seems they are relatively similar.
One thing is that you declared a new "catalog" module whereas I introduced a rust wrapper class for the python providers ... but it is very minor difference tbh , anyway let me take a closer look after work today though and provide some more useful input here

@renato2099
Copy link
Contributor

Hi @timsaucer ,

thanks for pushing on this, I think the main difference is that you are extending the PyCatalog whereas I am introducing a new PyCatalogProvider. Both approaches have their pros/cons, imho keeping the separation between catalogs and catalog_providers might make things clearer at the interface level.

That said I don't have a feel what datafusion-python users might find better, i.e., more rigid+"clearer" APIs vs more fluid+a bit less clearer API by accepting different object types (catalogs and catalog_providers when registering providers). Maybe it is more pythonic to accept different object types?

PR looks good to me overall though, thanks again! and sorry for the delay on my side :(

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @timsaucer and @renato2099 🚀

@timsaucer timsaucer force-pushed the feat/python-catalog-provider branch from ca3cb36 to a6baba1 Compare July 2, 2025 11:08
@timsaucer timsaucer merged commit 9362f53 into apache:main Jul 2, 2025
17 checks passed
@timsaucer timsaucer deleted the feat/python-catalog-provider branch July 2, 2025 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add CatalogProvider and SchemaProvider
3 participants